Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts#5302
Conversation
setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every
non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles
(app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30
releases looking for a non-existent app-*-linux-x64-cpu asset, exited
the prebuilt planner with "no compatible Linux prebuilt asset was
found", and fell through to a source build. Free CI runners
(ubuntu-latest with no GPU) hit this on every install, and anyone
running Studio on a Linux laptop without an NVIDIA GPU paid the
~3 minute cmake+make cost on first install.
ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release
and install_llama_prebuilt.py already knows how to fetch it: when
called with --published-repo ggml-org/llama.cpp, the Linux x86_64 +
not has_usable_nvidia branch in direct_upstream_release_plan picks up
that asset directly. The fix is purely on the routing side.
Tighten the gate so a Linux host routes to ggml-org only when it is
x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo,
amd-smi, hipconfig, hipinfo). Everything else stays on the current
path:
- macOS: already on ggml-org, unchanged
- Windows: already on ggml-org via setup.ps1, unchanged
- Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged
- Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present
-> unslothai/llama.cpp -> source build with HIP,
unchanged
- Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new
ggml-org route, gets upstream CPU asset (same as
today's source-build CPU output, ~3 min faster)
- Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp ->
source build, unchanged
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c84a012847
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if command -v "$_GPU_TOOL" >/dev/null 2>&1; then | ||
| _LINUX_HAS_GPU=true |
There was a problem hiding this comment.
Probe usable GPUs instead of tool presence
On Linux x86_64 CPU-only environments that still have GPU utilities on PATH, such as CUDA-based Docker images run without --gpus or hosts with CUDA_VISIBLE_DEVICES hiding all devices, this command -v nvidia-smi check routes setup back to unslothai/llama.cpp. The Python installer already distinguishes this case as has_usable_nvidia=false, but with the unsloth repo it then scans CUDA-only Linux assets and falls back to a source build, so the new CPU prebuilt fast path is skipped exactly for these CPU-only installs. Please make this gate use the same active GPU probing semantics as detect_host() or defer the routing until after that detection.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request updates studio/setup.sh to improve the selection of prebuilt binaries for CPU-only Linux x86_64 hosts by routing them to the ggml-org/llama.cpp repository. This prevents these hosts from attempting to download non-existent CPU assets from the Unsloth repository and falling back to source builds. The reviewer suggested combining the conditional logic for Darwin and CPU-only Linux into a single block using modern Bash syntax to improve code conciseness.
| if [ "$_HOST_SYSTEM" = "Darwin" ]; then | ||
| _HELPER_RELEASE_REPO="ggml-org/llama.cpp" | ||
| elif [ "$_HOST_SYSTEM" = "Linux" ] \ | ||
| && [ "$_HOST_MACHINE" = "x86_64" ] \ | ||
| && [ "$_LINUX_HAS_GPU" = false ]; then | ||
| _HELPER_RELEASE_REPO="ggml-org/llama.cpp" | ||
| else | ||
| _HELPER_RELEASE_REPO="unslothai/llama.cpp" | ||
| fi |
There was a problem hiding this comment.
To improve conciseness and avoid repeating the assignment to _HELPER_RELEASE_REPO, you can combine the conditions for Darwin and CPU-only Linux into a single if block. Using the [[ ... ]] construct is also more modern and readable for complex conditions in Bash.
| if [ "$_HOST_SYSTEM" = "Darwin" ]; then | |
| _HELPER_RELEASE_REPO="ggml-org/llama.cpp" | |
| elif [ "$_HOST_SYSTEM" = "Linux" ] \ | |
| && [ "$_HOST_MACHINE" = "x86_64" ] \ | |
| && [ "$_LINUX_HAS_GPU" = false ]; then | |
| _HELPER_RELEASE_REPO="ggml-org/llama.cpp" | |
| else | |
| _HELPER_RELEASE_REPO="unslothai/llama.cpp" | |
| fi | |
| if [[ "$_HOST_SYSTEM" == "Darwin" || ( "$_HOST_SYSTEM" == "Linux" && "$_HOST_MACHINE" == "x86_64" && "$_LINUX_HAS_GPU" == false ) ]]; then | |
| _HELPER_RELEASE_REPO="ggml-org/llama.cpp" | |
| else | |
| _HELPER_RELEASE_REPO="unslothai/llama.cpp" | |
| fi |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 449f84fb57
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # x86_64 routes to ggml-org for bin-ubuntu-x64.tar.gz. Anything with a | ||
| # GPU tool installed stays on unslothai (CUDA bundle / ROCm source build). | ||
| _LINUX_HAS_GPU=false | ||
| for _GPU_TOOL in nvidia-smi rocminfo amd-smi hipconfig hipinfo; do |
There was a problem hiding this comment.
Include compiler probes in GPU routing
On Linux x86_64 hosts where the CUDA/ROCm compiler is installed but these runtime utilities are not on PATH (for example nvcc under /usr/local/cuda/bin or hipcc under /opt/rocm/bin), this loop leaves _LINUX_HAS_GPU=false and the new branch installs the upstream CPU tarball successfully. That suppresses the existing source-build path that explicitly checks those compiler locations and enables -DGGML_CUDA=ON / -DGGML_HIP=ON later in this same script, so those environments silently lose GPU-enabled llama.cpp instead of building it as before.
Useful? React with 👍 / 👎.
* Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts
setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every
non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles
(app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30
releases looking for a non-existent app-*-linux-x64-cpu asset, exited
the prebuilt planner with "no compatible Linux prebuilt asset was
found", and fell through to a source build. Free CI runners
(ubuntu-latest with no GPU) hit this on every install, and anyone
running Studio on a Linux laptop without an NVIDIA GPU paid the
~3 minute cmake+make cost on first install.
ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release
and install_llama_prebuilt.py already knows how to fetch it: when
called with --published-repo ggml-org/llama.cpp, the Linux x86_64 +
not has_usable_nvidia branch in direct_upstream_release_plan picks up
that asset directly. The fix is purely on the routing side.
Tighten the gate so a Linux host routes to ggml-org only when it is
x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo,
amd-smi, hipconfig, hipinfo). Everything else stays on the current
path:
- macOS: already on ggml-org, unchanged
- Windows: already on ggml-org via setup.ps1, unchanged
- Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged
- Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present
-> unslothai/llama.cpp -> source build with HIP,
unchanged
- Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new
ggml-org route, gets upstream CPU asset (same as
today's source-build CPU output, ~3 min faster)
- Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp ->
source build, unchanged
* Tighten routing comment in studio/setup.sh
Summary
studio/setup.shhard-coded_HELPER_RELEASE_REPO=unslothai/llama.cppfor every non-Darwin host.unslothai/llama.cpponly publishes Linux CUDA bundles (app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked roughly 30 releases looking for a non-existentapp-*-linux-x64-cpuasset, exited the prebuilt planner withno compatible Linux prebuilt asset was found, and fell through to a source build. This is what every freeubuntu-latestrunner hits, and what every Linux laptop without an NVIDIA GPU pays a ~3 minute cmake + make cost on at first install.ggml-org/llama.cpppublishesllama-<tag>-bin-ubuntu-x64.tar.gzon every release, andstudio/install_llama_prebuilt.pyalready knows how to fetch it: when called with--published-repo ggml-org/llama.cpp, thedirect_upstream_release_planbranch athost.is_linux and host.is_x86_64 and not host.has_usable_nvidiapicks up that asset directly (install_llama_prebuilt.py:1313-1326). The bug was purely in the routing.Fix
Tighten the gate in
studio/setup.shso a Linux host routes toggml-org/llama.cpponly when it is x86_64 and has no GPU detection tool installed (nvidia-smi,rocminfo,amd-smi,hipconfig,hipinfo). Everything else stays on the current path.Routing matrix
Only one cell flips. Everything else is identity.
bin-macos-{arm64,x64}.tar.gzbin-win-cpu-x64.zip(via setup.ps1)bin-win-cuda-*.zipbin-win-hip-radeon-x64.zipbin-ubuntu-x64.tar.gz(~10 sec)app-*-linux-x64-cuda*.tar.gz-DGGML_HIP=ONLinux ROCm fast-path (
bin-ubuntu-rocm-7.2-x64.tar.gz) is intentionally out of scope for this PR. The richer code pathresolve_upstream_asset_choiceatinstall_llama_prebuilt.py:3124-3183already knows how to pick the right ROCm minor against the host runtime; porting that intodirect_upstream_release_planso AMD users on Linux also get a prebuilt is a clean follow-up.Verification
Locally exercised the gate against synthetic
PATHs covering every combination above. Routing matches the matrix, including the corner cases (MINGW*uname, aarch64 with NVIDIA tools, Darwin with NVIDIA tools).bash -n studio/setup.shpasses.Test plan
ubuntu-latestrunner on a CPU job:install.shlog showsprebuilt installed and validatedrather thanfalling back to source build.unslothai/llama.cppand picks the existing CUDA bundle.unslothai/llama.cpp, prebuilt resolution fails as before, and falls through to the existing source build with-DGGML_HIP=ON.ggml-org/llama.cppvia the Darwin branch).